{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "134a0785",
   "metadata": {},
   "source": [
    "# Chat Over Documents with Vectara\n",
    "\n",
    "This notebook is based on the [chat_vector_db](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/chat_vector_db.html) notebook, but using Vectara as the vector database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "70c4e529",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from langchain.vectorstores import Vectara\n",
    "from langchain.vectorstores.vectara import VectaraRetriever\n",
    "from langchain.llms import OpenAI\n",
    "from langchain.chains import ConversationalRetrievalChain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdff94be",
   "metadata": {},
   "source": [
    "Load in documents. You can replace this with a loader for whatever type of data you want"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "01c46e92",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "loader = TextLoader(\"../../../modules/state_of_the_union.txt\")\n",
    "documents = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "239475d2",
   "metadata": {},
   "source": [
    "We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a8930cf7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "vectorstore = Vectara.from_documents(documents, embedding=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "898b574b",
   "metadata": {},
   "source": [
    "We can now create a memory object, which is neccessary to track the inputs/outputs and hold a conversation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "af803fee",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.memory import ConversationBufferMemory\n",
    "\n",
    "memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c96b118",
   "metadata": {},
   "source": [
    "We now initialize the `ConversationalRetrievalChain`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7b4110f3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "openai_api_key = os.environ[\"OPENAI_API_KEY\"]\n",
    "llm = OpenAI(openai_api_key=openai_api_key, temperature=0)\n",
    "retriever = vectorstore.as_retriever(lambda_val=0.025, k=5, filter=None)\n",
    "d = retriever.get_relevant_documents(\n",
    "    \"What did the president say about Ketanji Brown Jackson\"\n",
    ")\n",
    "\n",
    "qa = ConversationalRetrievalChain.from_llm(llm, retriever, memory=memory)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e8ce4fe9",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa({\"question\": query})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "4c79862b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and a former top litigator in private practice, and that she will continue Justice Breyer's legacy of excellence.\""
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c697d9d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"Did he mention who she suceeded\"\n",
    "result = qa({\"question\": query})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "ba0678f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' Ketanji Brown Jackson succeeded Justice Breyer.'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3308b01-5300-4999-8cd3-22f16dae757e",
   "metadata": {},
   "source": [
    "## Pass in chat history\n",
    "\n",
    "In the above example, we used a Memory object to track chat history. We can also just pass it in explicitly. In order to do this, we need to initialize a chain without any memory object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "1b41a10b-bf68-4689-8f00-9aed7675e2ab",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "qa = ConversationalRetrievalChain.from_llm(\n",
    "    OpenAI(temperature=0), vectorstore.as_retriever()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83f38c18-ac82-45f4-a79e-8b37ce1ae115",
   "metadata": {},
   "source": [
    "Here's an example of asking a question with no chat history"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "bc672290-8a8b-4828-a90c-f1bbdd6b3920",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "6b62d758-c069-4062-88f0-21e7ea4710bf",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and a former top litigator in private practice, and that she will continue Justice Breyer's legacy of excellence.\""
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c26a83d-c945-4458-b54a-c6bd7f391303",
   "metadata": {},
   "source": [
    "Here's an example of asking a question with some chat history"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "9c95460b-7116-4155-a9d2-c0fb027ee592",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = [(query, result[\"answer\"])]\n",
    "query = \"Did he mention who she suceeded\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "698ac00c-cadc-407f-9423-226b2d9258d0",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' Ketanji Brown Jackson succeeded Justice Breyer.'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0eaadf0f",
   "metadata": {},
   "source": [
    "## Return Source Documents\n",
    "You can also easily return source documents from the ConversationalRetrievalChain. This is useful for when you want to inspect what documents were returned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "562769c6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "qa = ConversationalRetrievalChain.from_llm(\n",
    "    llm, vectorstore.as_retriever(), return_source_documents=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "ea478300",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "4cb75b4e",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content='Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice.', metadata={'source': '../../../modules/state_of_the_union.txt'})"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"source_documents\"][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "669ede2f-d69f-4960-8468-8a768ce1a55f",
   "metadata": {},
   "source": [
    "## ConversationalRetrievalChain with `search_distance`\n",
    "If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "f4f32c6f-8e49-44af-9116-8830b1fcc5f2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "vectordbkwargs = {\"search_distance\": 0.9}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "1e251775-31e7-4679-b744-d4a57937f93a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "qa = ConversationalRetrievalChain.from_llm(\n",
    "    OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True\n",
    ")\n",
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa(\n",
    "    {\"question\": query, \"chat_history\": chat_history, \"vectordbkwargs\": vectordbkwargs}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "24ebdaec",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The president said that Ketanji Brown Jackson is one of the nation's top legal minds and a former top litigator in private practice, and that she will continue Justice Breyer's legacy of excellence.\n"
     ]
    }
   ],
   "source": [
    "print(result[\"answer\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99b96dae",
   "metadata": {},
   "source": [
    "## ConversationalRetrievalChain with `map_reduce`\n",
    "We can also use different types of combine document chains with the ConversationalRetrievalChain chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "e53a9d66",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains import LLMChain\n",
    "from langchain.chains.question_answering import load_qa_chain\n",
    "from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "bf205e35",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
    "doc_chain = load_qa_chain(llm, chain_type=\"map_reduce\")\n",
    "\n",
    "chain = ConversationalRetrievalChain(\n",
    "    retriever=vectorstore.as_retriever(),\n",
    "    question_generator=question_generator,\n",
    "    combine_docs_chain=doc_chain,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "78155887",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = chain({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "e54b5fa2",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, who is one of the nation's top legal minds and a former top litigator in private practice.\""
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2fe6b14",
   "metadata": {},
   "source": [
    "## ConversationalRetrievalChain with Question Answering with sources\n",
    "\n",
    "You can also use this chain with the question answering with sources chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "d1058fd2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains.qa_with_sources import load_qa_with_sources_chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "a6594482",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
    "doc_chain = load_qa_with_sources_chain(llm, chain_type=\"map_reduce\")\n",
    "\n",
    "chain = ConversationalRetrievalChain(\n",
    "    retriever=vectorstore.as_retriever(),\n",
    "    question_generator=question_generator,\n",
    "    combine_docs_chain=doc_chain,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "e2badd21",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = chain({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "edb31fe5",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and a former top litigator in private practice.\\nSOURCES: ../../../modules/state_of_the_union.txt\""
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
   "metadata": {},
   "source": [
    "## ConversationalRetrievalChain with streaming to `stdout`\n",
    "\n",
    "Output from the chain will be streamed to `stdout` token by token in this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains.llm import LLMChain\n",
    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
    "from langchain.chains.conversational_retrieval.prompts import (\n",
    "    CONDENSE_QUESTION_PROMPT,\n",
    "    QA_PROMPT,\n",
    ")\n",
    "from langchain.chains.question_answering import load_qa_chain\n",
    "\n",
    "# Construct a ConversationalRetrievalChain with a streaming llm for combine docs\n",
    "# and a separate, non-streaming llm for question generation\n",
    "llm = OpenAI(temperature=0, openai_api_key=openai_api_key)\n",
    "streaming_llm = OpenAI(\n",
    "    streaming=True,\n",
    "    callbacks=[StreamingStdOutCallbackHandler()],\n",
    "    temperature=0,\n",
    "    openai_api_key=openai_api_key,\n",
    ")\n",
    "\n",
    "question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
    "doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=QA_PROMPT)\n",
    "\n",
    "qa = ConversationalRetrievalChain(\n",
    "    retriever=vectorstore.as_retriever(),\n",
    "    combine_docs_chain=doc_chain,\n",
    "    question_generator=question_generator,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The president said that Ketanji Brown Jackson is one of the nation's top legal minds and a former top litigator in private practice, and that she will continue Justice Breyer's legacy of excellence."
     ]
    }
   ],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Justice Breyer"
     ]
    }
   ],
   "source": [
    "chat_history = [(query, result[\"answer\"])]\n",
    "query = \"Did he mention who she suceeded\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f793d56b",
   "metadata": {},
   "source": [
    "## get_chat_history Function\n",
    "You can also specify a `get_chat_history` function, which can be used to format the chat_history string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "a7ba9d8c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def get_chat_history(inputs) -> str:\n",
    "    res = []\n",
    "    for human, ai in inputs:\n",
    "        res.append(f\"Human:{human}\\nAI:{ai}\")\n",
    "    return \"\\n\".join(res)\n",
    "\n",
    "\n",
    "qa = ConversationalRetrievalChain.from_llm(\n",
    "    llm, vectorstore.as_retriever(), get_chat_history=get_chat_history\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "a3e33c0d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_history = []\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = qa({\"question\": query, \"chat_history\": chat_history})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "936dc62f",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and a former top litigator in private practice, and that she will continue Justice Breyer's legacy of excellence.\""
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"answer\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8c26901",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
